-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[IR] Add llvm clmul
intrinsic
#140301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[IR] Add llvm clmul
intrinsic
#140301
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-llvm-ir Author: Oscar Smith (oscardssmith) ChangesThis is the generic version of So far I have only hooked this up for the RISCV backend, but the x86 backend should be pretty easy as well. This is my first LLVM PR, so please tell me everything that I've messed up. Full diff: https://github.com/llvm/llvm-project/pull/140301.diff 4 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index a1ae6611acd3c..636f18f28610b 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -10471,8 +10471,8 @@ its two operands.
.. note::
- The instruction is implemented as a call to libm's '``fmod``'
- for some targets, and using the instruction may thus require linking libm.
+ The instruction is implemented as a call to libm's '``fmod``'
+ for some targets, and using the instruction may thus require linking libm.
Arguments:
@@ -18055,6 +18055,54 @@ Example:
%r = call i8 @llvm.fshr.i8(i8 15, i8 15, i8 11) ; %r = i8: 225 (0b11100001)
%r = call i8 @llvm.fshr.i8(i8 0, i8 255, i8 8) ; %r = i8: 255 (0b11111111)
+.. clmul:
+
+'``clmul.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.clmul``
+on any integer bit width or vectors of integers.
+
+::
+
+ declare i16 @llvm.clmul.i16(i16 %a, i16 %b)
+ declare i32 @llvm.clmul.i32(i32 %a, i32 %b)
+ declare i64 @llvm.clmul.i64(i64 %a, i64 %b)
+ declare <4 x i32> @llvm.clmult.v4i32(<4 x i32> %a, <4 x i32> %b)
+
+Overview
+"""""""""
+
+The '``llvm.clmul``' family of intrinsics functions perform carryless multiplication
+(also known as xor multiplication) on the 2 arguments.
+
+Arguments
+""""""""""
+
+The arguments (%a and %b) and the result may be of integer types of any bit
+width, but they must have the same bit width. ``%a`` and ``%b`` are the two
+values that will undergo carryless multiplication.
+
+Semantics:
+""""""""""
+
+The ‘llvm.clmul’ intrinsic computes carryless multiply of ``%a`` and ``%b``, which is the result
+of applying the standard multiplication algorithm if you replace all of the aditions with exclusive ors.
+The vector intrinsics, such as llvm.clmul.v4i32, operate on a per-element basis and the element order is not affected.
+
+Examples
+"""""""""
+
+.. code-block:: llvm
+
+ %res = call i4 @llvm.clmul.i4(i4 1, i4 2) ; %res = 2
+ %res = call i4 @llvm.clmul.i4(i4 5, i4 6) ; %res = 14
+ %res = call i4 @llvm.clmul.i4(i4 -4, i4 2) ; %res = -8
+ %res = call i4 @llvm.clmul.i4(i4 -4, i4 -5) ; %res = -12
+
Arithmetic with Overflow Intrinsics
-----------------------------------
@@ -24244,14 +24292,14 @@ Examples:
.. code-block:: text
- %r = call <8 x i64> @llvm.experimental.vp.strided.load.v8i64.i64(i64* %ptr, i64 %stride, <8 x i64> %mask, i32 %evl)
- ;; The operation can also be expressed like this:
+ %r = call <8 x i64> @llvm.experimental.vp.strided.load.v8i64.i64(i64* %ptr, i64 %stride, <8 x i64> %mask, i32 %evl)
+ ;; The operation can also be expressed like this:
- %addr = bitcast i64* %ptr to i8*
- ;; Create a vector of pointers %addrs in the form:
- ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
- %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
- %also.r = call <8 x i64> @llvm.vp.gather.v8i64.v8p0i64(<8 x i64* > %ptrs, <8 x i64> %mask, i32 %evl)
+ %addr = bitcast i64* %ptr to i8*
+ ;; Create a vector of pointers %addrs in the form:
+ ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
+ %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
+ %also.r = call <8 x i64> @llvm.vp.gather.v8i64.v8p0i64(<8 x i64* > %ptrs, <8 x i64> %mask, i32 %evl)
.. _int_experimental_vp_strided_store:
@@ -24295,7 +24343,7 @@ The '``llvm.experimental.vp.strided.store``' intrinsic stores the elements of
'``val``' in the same way as the :ref:`llvm.vp.scatter <int_vp_scatter>` intrinsic,
where the vector of pointers is in the form:
- ``%ptrs = <%ptr, %ptr + %stride, %ptr + 2 * %stride, ... >``,
+ ``%ptrs = <%ptr, %ptr + %stride, %ptr + 2 * %stride, ... >``,
with '``ptr``' previously casted to a pointer '``i8``', '``stride``' always interpreted as a signed
integer and all arithmetic occurring in the pointer type.
@@ -24305,14 +24353,14 @@ Examples:
.. code-block:: text
- call void @llvm.experimental.vp.strided.store.v8i64.i64(<8 x i64> %val, i64* %ptr, i64 %stride, <8 x i1> %mask, i32 %evl)
- ;; The operation can also be expressed like this:
+ call void @llvm.experimental.vp.strided.store.v8i64.i64(<8 x i64> %val, i64* %ptr, i64 %stride, <8 x i1> %mask, i32 %evl)
+ ;; The operation can also be expressed like this:
- %addr = bitcast i64* %ptr to i8*
- ;; Create a vector of pointers %addrs in the form:
- ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
- %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
- call void @llvm.vp.scatter.v8i64.v8p0i64(<8 x i64> %val, <8 x i64*> %ptrs, <8 x i1> %mask, i32 %evl)
+ %addr = bitcast i64* %ptr to i8*
+ ;; Create a vector of pointers %addrs in the form:
+ ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
+ %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
+ call void @llvm.vp.scatter.v8i64.v8p0i64(<8 x i64> %val, <8 x i64*> %ptrs, <8 x i1> %mask, i32 %evl)
.. _int_vp_gather:
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index e1a135a5ad48e..1857829910340 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1431,6 +1431,8 @@ let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrWillReturn] in {
[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
def int_fshr : DefaultAttrsIntrinsic<[llvm_anyint_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
+ def int_clmul : DefaultAttrsIntrinsic<[llvm_anyint_ty],
+ [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
}
let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrWillReturn,
@@ -2103,6 +2105,12 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn] in {
LLVMMatchType<0>,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
llvm_i32_ty]>;
+ def int_vp_clmul : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
+ [ LLVMMatchType<0>,
+ LLVMMatchType<0>,
+ LLVMMatchType<0>,
+ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
+ llvm_i32_ty]>;
def int_vp_sadd_sat : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
[ LLVMMatchType<0>,
LLVMMatchType<0>,
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index fae2cda13863d..6167c375755fd 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -10348,6 +10348,7 @@ SDValue RISCVTargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
return DAG.getNode(RISCVISD::MOPRR, DL, XLenVT, Op.getOperand(1),
Op.getOperand(2), Op.getOperand(3));
}
+ case Intrinsic::clmul:
case Intrinsic::riscv_clmul:
return DAG.getNode(RISCVISD::CLMUL, DL, XLenVT, Op.getOperand(1),
Op.getOperand(2));
@@ -14284,6 +14285,7 @@ void RISCVTargetLowering::ReplaceNodeResults(SDNode *N,
Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, MVT::i32, Res));
return;
}
+ case Intrinsic::clmul:
case Intrinsic::riscv_clmul: {
if (!Subtarget.is64Bit() || N->getValueType(0) != MVT::i32)
return;
diff --git a/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll b/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll
index aa9e89bc20953..5017f9f4853b5 100644
--- a/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll
+++ b/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll
@@ -4,7 +4,7 @@
; RUN: llc -mtriple=riscv64 -mattr=+zbkc -verify-machineinstrs < %s \
; RUN: | FileCheck %s -check-prefix=RV64ZBC-ZBKC
-declare i64 @llvm.riscv.clmul.i64(i64 %a, i64 %b)
+declare i64 @llvm.clmul.i64(i64 %a, i64 %b)
define i64 @clmul64(i64 %a, i64 %b) nounwind {
; RV64ZBC-ZBKC-LABEL: clmul64:
@@ -26,7 +26,7 @@ define i64 @clmul64h(i64 %a, i64 %b) nounwind {
ret i64 %tmp
}
-declare i32 @llvm.riscv.clmul.i32(i32 %a, i32 %b)
+declare i32 @llvm.clmul.i32(i32 %a, i32 %b)
define signext i32 @clmul32(i32 signext %a, i32 signext %b) nounwind {
; RV64ZBC-ZBKC-LABEL: clmul32:
@@ -34,7 +34,7 @@ define signext i32 @clmul32(i32 signext %a, i32 signext %b) nounwind {
; RV64ZBC-ZBKC-NEXT: clmul a0, a0, a1
; RV64ZBC-ZBKC-NEXT: sext.w a0, a0
; RV64ZBC-ZBKC-NEXT: ret
- %tmp = call i32 @llvm.riscv.clmul.i32(i32 %a, i32 %b)
+ %tmp = call i32 @llvm.clmul.i32(i32 %a, i32 %b)
ret i32 %tmp
}
|
@llvm/pr-subscribers-backend-risc-v Author: Oscar Smith (oscardssmith) ChangesThis is the generic version of So far I have only hooked this up for the RISCV backend, but the x86 backend should be pretty easy as well. This is my first LLVM PR, so please tell me everything that I've messed up. Full diff: https://github.com/llvm/llvm-project/pull/140301.diff 4 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index a1ae6611acd3c..636f18f28610b 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -10471,8 +10471,8 @@ its two operands.
.. note::
- The instruction is implemented as a call to libm's '``fmod``'
- for some targets, and using the instruction may thus require linking libm.
+ The instruction is implemented as a call to libm's '``fmod``'
+ for some targets, and using the instruction may thus require linking libm.
Arguments:
@@ -18055,6 +18055,54 @@ Example:
%r = call i8 @llvm.fshr.i8(i8 15, i8 15, i8 11) ; %r = i8: 225 (0b11100001)
%r = call i8 @llvm.fshr.i8(i8 0, i8 255, i8 8) ; %r = i8: 255 (0b11111111)
+.. clmul:
+
+'``clmul.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax
+"""""""
+
+This is an overloaded intrinsic. You can use ``llvm.clmul``
+on any integer bit width or vectors of integers.
+
+::
+
+ declare i16 @llvm.clmul.i16(i16 %a, i16 %b)
+ declare i32 @llvm.clmul.i32(i32 %a, i32 %b)
+ declare i64 @llvm.clmul.i64(i64 %a, i64 %b)
+ declare <4 x i32> @llvm.clmult.v4i32(<4 x i32> %a, <4 x i32> %b)
+
+Overview
+"""""""""
+
+The '``llvm.clmul``' family of intrinsics functions perform carryless multiplication
+(also known as xor multiplication) on the 2 arguments.
+
+Arguments
+""""""""""
+
+The arguments (%a and %b) and the result may be of integer types of any bit
+width, but they must have the same bit width. ``%a`` and ``%b`` are the two
+values that will undergo carryless multiplication.
+
+Semantics:
+""""""""""
+
+The ‘llvm.clmul’ intrinsic computes carryless multiply of ``%a`` and ``%b``, which is the result
+of applying the standard multiplication algorithm if you replace all of the aditions with exclusive ors.
+The vector intrinsics, such as llvm.clmul.v4i32, operate on a per-element basis and the element order is not affected.
+
+Examples
+"""""""""
+
+.. code-block:: llvm
+
+ %res = call i4 @llvm.clmul.i4(i4 1, i4 2) ; %res = 2
+ %res = call i4 @llvm.clmul.i4(i4 5, i4 6) ; %res = 14
+ %res = call i4 @llvm.clmul.i4(i4 -4, i4 2) ; %res = -8
+ %res = call i4 @llvm.clmul.i4(i4 -4, i4 -5) ; %res = -12
+
Arithmetic with Overflow Intrinsics
-----------------------------------
@@ -24244,14 +24292,14 @@ Examples:
.. code-block:: text
- %r = call <8 x i64> @llvm.experimental.vp.strided.load.v8i64.i64(i64* %ptr, i64 %stride, <8 x i64> %mask, i32 %evl)
- ;; The operation can also be expressed like this:
+ %r = call <8 x i64> @llvm.experimental.vp.strided.load.v8i64.i64(i64* %ptr, i64 %stride, <8 x i64> %mask, i32 %evl)
+ ;; The operation can also be expressed like this:
- %addr = bitcast i64* %ptr to i8*
- ;; Create a vector of pointers %addrs in the form:
- ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
- %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
- %also.r = call <8 x i64> @llvm.vp.gather.v8i64.v8p0i64(<8 x i64* > %ptrs, <8 x i64> %mask, i32 %evl)
+ %addr = bitcast i64* %ptr to i8*
+ ;; Create a vector of pointers %addrs in the form:
+ ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
+ %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
+ %also.r = call <8 x i64> @llvm.vp.gather.v8i64.v8p0i64(<8 x i64* > %ptrs, <8 x i64> %mask, i32 %evl)
.. _int_experimental_vp_strided_store:
@@ -24295,7 +24343,7 @@ The '``llvm.experimental.vp.strided.store``' intrinsic stores the elements of
'``val``' in the same way as the :ref:`llvm.vp.scatter <int_vp_scatter>` intrinsic,
where the vector of pointers is in the form:
- ``%ptrs = <%ptr, %ptr + %stride, %ptr + 2 * %stride, ... >``,
+ ``%ptrs = <%ptr, %ptr + %stride, %ptr + 2 * %stride, ... >``,
with '``ptr``' previously casted to a pointer '``i8``', '``stride``' always interpreted as a signed
integer and all arithmetic occurring in the pointer type.
@@ -24305,14 +24353,14 @@ Examples:
.. code-block:: text
- call void @llvm.experimental.vp.strided.store.v8i64.i64(<8 x i64> %val, i64* %ptr, i64 %stride, <8 x i1> %mask, i32 %evl)
- ;; The operation can also be expressed like this:
+ call void @llvm.experimental.vp.strided.store.v8i64.i64(<8 x i64> %val, i64* %ptr, i64 %stride, <8 x i1> %mask, i32 %evl)
+ ;; The operation can also be expressed like this:
- %addr = bitcast i64* %ptr to i8*
- ;; Create a vector of pointers %addrs in the form:
- ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
- %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
- call void @llvm.vp.scatter.v8i64.v8p0i64(<8 x i64> %val, <8 x i64*> %ptrs, <8 x i1> %mask, i32 %evl)
+ %addr = bitcast i64* %ptr to i8*
+ ;; Create a vector of pointers %addrs in the form:
+ ;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>
+ %ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >
+ call void @llvm.vp.scatter.v8i64.v8p0i64(<8 x i64> %val, <8 x i64*> %ptrs, <8 x i1> %mask, i32 %evl)
.. _int_vp_gather:
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index e1a135a5ad48e..1857829910340 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1431,6 +1431,8 @@ let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrWillReturn] in {
[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
def int_fshr : DefaultAttrsIntrinsic<[llvm_anyint_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
+ def int_clmul : DefaultAttrsIntrinsic<[llvm_anyint_ty],
+ [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>]>;
}
let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrWillReturn,
@@ -2103,6 +2105,12 @@ let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn] in {
LLVMMatchType<0>,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
llvm_i32_ty]>;
+ def int_vp_clmul : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
+ [ LLVMMatchType<0>,
+ LLVMMatchType<0>,
+ LLVMMatchType<0>,
+ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
+ llvm_i32_ty]>;
def int_vp_sadd_sat : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
[ LLVMMatchType<0>,
LLVMMatchType<0>,
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index fae2cda13863d..6167c375755fd 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -10348,6 +10348,7 @@ SDValue RISCVTargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
return DAG.getNode(RISCVISD::MOPRR, DL, XLenVT, Op.getOperand(1),
Op.getOperand(2), Op.getOperand(3));
}
+ case Intrinsic::clmul:
case Intrinsic::riscv_clmul:
return DAG.getNode(RISCVISD::CLMUL, DL, XLenVT, Op.getOperand(1),
Op.getOperand(2));
@@ -14284,6 +14285,7 @@ void RISCVTargetLowering::ReplaceNodeResults(SDNode *N,
Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, MVT::i32, Res));
return;
}
+ case Intrinsic::clmul:
case Intrinsic::riscv_clmul: {
if (!Subtarget.is64Bit() || N->getValueType(0) != MVT::i32)
return;
diff --git a/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll b/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll
index aa9e89bc20953..5017f9f4853b5 100644
--- a/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll
+++ b/llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll
@@ -4,7 +4,7 @@
; RUN: llc -mtriple=riscv64 -mattr=+zbkc -verify-machineinstrs < %s \
; RUN: | FileCheck %s -check-prefix=RV64ZBC-ZBKC
-declare i64 @llvm.riscv.clmul.i64(i64 %a, i64 %b)
+declare i64 @llvm.clmul.i64(i64 %a, i64 %b)
define i64 @clmul64(i64 %a, i64 %b) nounwind {
; RV64ZBC-ZBKC-LABEL: clmul64:
@@ -26,7 +26,7 @@ define i64 @clmul64h(i64 %a, i64 %b) nounwind {
ret i64 %tmp
}
-declare i32 @llvm.riscv.clmul.i32(i32 %a, i32 %b)
+declare i32 @llvm.clmul.i32(i32 %a, i32 %b)
define signext i32 @clmul32(i32 signext %a, i32 signext %b) nounwind {
; RV64ZBC-ZBKC-LABEL: clmul32:
@@ -34,7 +34,7 @@ define signext i32 @clmul32(i32 signext %a, i32 signext %b) nounwind {
; RV64ZBC-ZBKC-NEXT: clmul a0, a0, a1
; RV64ZBC-ZBKC-NEXT: sext.w a0, a0
; RV64ZBC-ZBKC-NEXT: ret
- %tmp = call i32 @llvm.riscv.clmul.i32(i32 %a, i32 %b)
+ %tmp = call i32 @llvm.clmul.i32(i32 %a, i32 %b)
ret i32 %tmp
}
|
The title :) Should be "clmul" not "cmul". |
cmul
intrinsicclmul
intrinsic
1529bf8
to
c74bfec
Compare
We need to make See existing examples like
Then, for RISC-V we need to make it |
@topperc looking at |
I don't know. |
You can test this locally with the following command:git-clang-format --diff HEAD~1 HEAD --extensions h,cpp -- llvm/include/llvm/CodeGen/ISDOpcodes.h llvm/include/llvm/CodeGen/TargetLowering.h llvm/lib/CodeGen/IntrinsicLowering.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp llvm/lib/CodeGen/TargetLoweringBase.cpp llvm/lib/Target/RISCV/RISCVISelLowering.cpp View the diff from clang-format here.diff --git a/llvm/include/llvm/CodeGen/ISDOpcodes.h b/llvm/include/llvm/CodeGen/ISDOpcodes.h
index fc3b3b26c..589fb4522 100644
--- a/llvm/include/llvm/CodeGen/ISDOpcodes.h
+++ b/llvm/include/llvm/CodeGen/ISDOpcodes.h
@@ -751,7 +751,7 @@ enum NodeType {
ROTR,
FSHL,
FSHR,
-
+
/// Carryless multiplication operator
CLMUL,
diff --git a/llvm/lib/CodeGen/IntrinsicLowering.cpp b/llvm/lib/CodeGen/IntrinsicLowering.cpp
index 9111790e0..09ac7ce20 100644
--- a/llvm/lib/CodeGen/IntrinsicLowering.cpp
+++ b/llvm/lib/CodeGen/IntrinsicLowering.cpp
@@ -200,7 +200,8 @@ static Value *LowerCTLZ(LLVMContext &Context, Value *V, Instruction *IP) {
}
/// Emit the code to lower clmul of V1, V2 before the specified instruction IP.
-static Value *LowerCLMUL(LLVMContext &Context, Value *V1, Value *V2, Instruction *IP) {
+static Value *LowerCLMUL(LLVMContext &Context, Value *V1, Value *V2,
+ Instruction *IP) {
IRBuilder<> Builder(IP);
@@ -282,7 +283,8 @@ void IntrinsicLowering::LowerIntrinsicCall(CallInst *CI) {
break;
case Intrinsic::clmul:
- CI->replaceAllUsesWith(LowerCLMUL(Context, CI->getArgOperand(0), CI->getArgOperand(1), CI));
+ CI->replaceAllUsesWith(
+ LowerCLMUL(Context, CI->getArgOperand(0), CI->getArgOperand(1), CI));
break;
case Intrinsic::cttz: {
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 3b5a8bf5c..dd6c8ebbc 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -208,7 +208,9 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
case ISD::VP_ADD:
case ISD::VP_SUB:
case ISD::VP_MUL:
- case ISD::CLMUL: Res = PromoteIntRes_SimpleIntBinOp(N); break;
+ case ISD::CLMUL:
+ Res = PromoteIntRes_SimpleIntBinOp(N);
+ break;
case ISD::ABDS:
case ISD::AVGCEILS:
@@ -5422,15 +5424,14 @@ void DAGTypeLegalizer::ExpandIntRes_FunnelShift(SDNode *N, SDValue &Lo,
Hi = DAG.getNode(Opc, DL, HalfVT, Select3, Select2, NewShAmt);
}
-void DAGTypeLegalizer::ExpandIntRes_CLMUL(SDNode *N, SDValue &Lo,
- SDValue &Hi) {
+void DAGTypeLegalizer::ExpandIntRes_CLMUL(SDNode *N, SDValue &Lo, SDValue &Hi) {
// Values numbered from least significant to most significant.
SDValue LL, LH, RL, RH;
GetExpandedInteger(N->getOperand(0), LL, LH);
GetExpandedInteger(N->getOperand(1), RL, RH);
EVT HalfVT = LL.getValueType();
SDLoc DL(N);
-
+
// CLMUL is carryless so Lo is computed from the low half
Lo = DAG.getNode(ISD::CLMUL, DL, HalfVT, LL, RL);
// the high bits not included in CLMUL(A,B) can be computed by
@@ -5438,14 +5439,15 @@ void DAGTypeLegalizer::ExpandIntRes_CLMUL(SDNode *N, SDValue &Lo,
// Therefore we can compute the 2 hi/lo cross products
// and the the overflow of the low product
// and xor them together to compute HI
- // TODO: if the target supports a widening CLMUL or a CLMULH we should probably use that
+ // TODO: if the target supports a widening CLMUL or a CLMULH we should
+ // probably use that
SDValue BitRevLL = DAG.getNode(ISD::BITREVERSE, DL, HalfVT, LL);
SDValue BitRevRL = DAG.getNode(ISD::BITREVERSE, DL, HalfVT, RL);
SDValue BitRevLoHi = DAG.getNode(ISD::CLMUL, DL, HalfVT, BitRevLL, BitRevRL);
SDValue LoHi = DAG.getNode(ISD::BITREVERSE, DL, HalfVT, BitRevLoHi);
SDValue One = DAG.getShiftAmountConstant(1, HalfVT, DL);
Hi = DAG.getNode(ISD::SRL, DL, HalfVT, LoHi, One);
-
+
SDValue HITMP = DAG.getNode(ISD::CLMUL, DL, HalfVT, LL, RH);
Hi = DAG.getNode(ISD::XOR, DL, HalfVT, Hi, HITMP);
HITMP = DAG.getNode(ISD::CLMUL, DL, HalfVT, LH, RL);
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 255a587cb..51f887295 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -508,7 +508,7 @@ private:
void ExpandIntRes_Rotate (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_FunnelShift (SDNode *N, SDValue &Lo, SDValue &Hi);
- void ExpandIntRes_CLMUL (SDNode *N, SDValue &Lo, SDValue &Hi);
+ void ExpandIntRes_CLMUL(SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_VSCALE (SDNode *N, SDValue &Lo, SDValue &Hi);
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index d32a68fb7..76bedcbb3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -8131,8 +8131,7 @@ SDValue TargetLowering::expandFunnelShift(SDNode *Node,
return DAG.getNode(ISD::OR, DL, VT, ShX, ShY);
}
-SDValue TargetLowering::expandCLMUL(SDNode *Node,
- SelectionDAG &DAG) const {
+SDValue TargetLowering::expandCLMUL(SDNode *Node, SelectionDAG &DAG) const {
SDLoc DL(Node);
EVT VT = Node->getValueType(0);
SDValue V1 = Node->getOperand(0);
@@ -8146,10 +8145,10 @@ SDValue TargetLowering::expandCLMUL(SDNode *Node,
// subvector.
if (VT.isVector() && (!isPowerOf2_32(NumBitsPerElt) ||
(!isOperationLegalOrCustom(ISD::SRL, VT) ||
- !isOperationLegalOrCustom(ISD::SHL, VT) ||
- !isOperationLegalOrCustom(ISD::XOR, VT) ||
- !isOperationLegalOrCustom(ISD::AND, VT) ||
- !isOperationLegalOrCustom(ISD::SELECT, VT))))
+ !isOperationLegalOrCustom(ISD::SHL, VT) ||
+ !isOperationLegalOrCustom(ISD::XOR, VT) ||
+ !isOperationLegalOrCustom(ISD::AND, VT) ||
+ !isOperationLegalOrCustom(ISD::SELECT, VT))))
return DAG.UnrollVectorOp(Node);
SDValue Res = DAG.getConstant(0, DL, VT);
|
d771bd7
to
14c2226
Compare
I've now added RiscV tests. I assume I need SelectionDag tests to show that the fallbacks implemented here are correct, but I don't see any guidance in https://releases.llvm.org/8.0.0/docs/ExtendingLLVM.html (or past PRs that I've looked at) on where these tests should go. |
also the RiscV tests I've added are very much not working. Running
Any idea what this means? |
This is the generic version of
int_x86_pclmulqdq
, andriscv_clmul
as discussed in https://discourse.llvm.org/t/rfc-carry-less-multiplication-instruction/55819/26, (and will allow implementations for powerpc and aarch64 backends that have this instruction but no backend intrinsic).So far I have only hooked this up for the RISCV backend, but the x86 backend should be pretty easy as well.
This is my first LLVM PR, so please tell me everything that I've messed up.